Add HIP backend by amd-asalykov · Pull Request #135 · ScalingIntelligence/KernelBench

amd-asalykov · 2026-01-22T16:26:24Z

How to install:

uv add torch --index pytorch=https://download.pytorch.org/whl/rocm7.1

Run on MI350X/MI355X:

uv run python scripts/generate_and_eval_single_sample.py gpu_arch=gfx950 backend=hip dataset_src=huggingface level=1 problem_id=22 server_type=google model_name=gemini/gemini-2.5-flash

Run on MI300X/MI325X:

uv run python scripts/generate_and_eval_single_sample.py gpu_arch=gfx942 backend=hip dataset_src=huggingface level=1 problem_id=22 server_type=google model_name=gemini/gemini-2.5-flash

simonguozirui · 2026-02-21T22:40:42Z

thanks so much @amd-asalykov, validating on a bare metal MI350X right now. Also thanks @laasya-konidala for setting things up and checking the codebase + verifying!

salykova · 2026-02-21T23:27:51Z

@simonguozirui you might have noticed that in the current implementation we rely on os.environ["CXX"] = "hipcc" in src/kernelbench/prompts/model_new_ex_add_hip.py to make pytorch's load_inline work with hip kernels. We didn't explicitly tell LLMs to include this LOC in the generated kernels. Instead we expect that LLMs will include it automatically based on the model_new_ex_add_hip.py example. As an alternative, we could introduce a backend-specific prompt and explicitly ask LLMs to include os.environ["CXX"] = "hipcc"

simonguozirui · 2026-02-22T01:03:51Z

Gotcha os.environ["CXX"] = "hipcc" triggers the HIP compiler and prevents it from running hipify (we had this issue last year when @willhu-jpg and I tried to implement AMD support; it would just hipify CUDA code #37). We can keep that as part of the in-context example which becomes a part of the prompt automatically, the same way we treat other backends like TK that need special include library (no need for backend specific warnings and instructions)

Added a few guardrails to ensure a separate AMD and NVIDIA code path

Two more small things to close things off

reward hack checking -> hip kernel must use hip-related keywords (regex-based matching)
L2 cache thrashing, for clearing cache rn we allocate a big tensor to thrash it, so for AMD need to see if the tensor size still works well there

amd-asalykov added 7 commits January 22, 2026 16:20

add hip backend for eval_single_sample

2b94308

update pyproject.toml for CDNA4

69ab132

update

fb5abb6

update

0628010

update

1a3bb41

update

416f70d

add ROCm version requirement

ac473fe

simonguozirui requested a review from willhu-jpg February 17, 2026 01:36

check and add more guardrails

0312e01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add HIP backend#135

Add HIP backend#135
amd-asalykov wants to merge 8 commits intoScalingIntelligence:mainfrom
amd-asalykov:main

amd-asalykov commented Jan 22, 2026 •

edited

Loading

Uh oh!

simonguozirui commented Feb 21, 2026 •

edited

Loading

Uh oh!

salykova commented Feb 21, 2026 •

edited

Loading

Uh oh!

simonguozirui commented Feb 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

amd-asalykov commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonguozirui commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

salykova commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonguozirui commented Feb 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

amd-asalykov commented Jan 22, 2026 •

edited

Loading

simonguozirui commented Feb 21, 2026 •

edited

Loading

salykova commented Feb 21, 2026 •

edited

Loading